Using Background Contextual Knowledge for Document Representation

نویسندگان

Arkadi Kosmynin

Ian Davidson

چکیده

We describe our approach to document representation that captures contextual dependencies between terms in a corpus and makes use of these dependencies to represent documents. We have tried our representation scheme for automatic document categorisation on the Reuters’ test set of documents. We achieve a precision recall break even point of 84% which is comparable to the best known published results. Our approach acts as a feature selection technique that is an alternative to applying the techniques from machine learning and numerical taxonomy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

خوشه‌بندی اسناد مبتنی بر آنتولوژی و رویکرد فازی

Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...

متن کامل

Document Clustering with Explicit Semantic Analysis (ESA)

Document clustering recently became a vital approach as numbers of documents on web and on proprietary repositories are increased in unprecedented manner. The documents that are written in human language generally contain some context and usage of words mainly dependent upon the same context; recently researchers have attempted to enrich document representation via external knowledge base. This...

متن کامل

Around the Tables – Contextual Factors in Healthcare Coverage Decisions Across Western Europe

Background Across Western Europe, procedures and formalised criteria for taking decisions on the coverage (inclusion in the benefits basket or equivalent) of healthcare technologies vary substantially. In the decision documents, which display the justification of, the rationale for, these decisions, national healthcare institutes ma...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

Knowledge Management through Content Interpretation

The improved performance of computer-based text analysis represents a major step forward for knowledge management. Reliable text interpretation allows focus to be placed upon the content of documents, rather than just the document wrapping, and this helps to emphasise the fundamental difference between knowledge management and document management. It is not uncommon for companies who wish to jo...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1996

Using Background Contextual Knowledge for Document Representation

نویسندگان

چکیده

منابع مشابه

خوشه‌بندی اسناد مبتنی بر آنتولوژی و رویکرد فازی

Document Clustering with Explicit Semantic Analysis (ESA)

Around the Tables – Contextual Factors in Healthcare Coverage Decisions Across Western Europe

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Knowledge Management through Content Interpretation

عنوان ژورنال:

اشتراک گذاری